This report explores the Prosper loans dataset. According to Wikipedia,
Prosper Marketplace is a company in the peer-to-peer lending industry and
operates Prosper.com. Borrowers request personal loans on
Prosper and investors can fund them. Prosper lists, collects and distributes
borrower payments and interest back to the loan investors.
The dataset contains a little over 113,000 loans with 81 variables on each loan. Click this link for list of all the variable definitions: Prosper Loan Data - Variable Definitions.
In this section I will explore the full dataset. After initial analysis, I
will choose 10-15 variables to explore further.
## [1] 113937 81
## ListingNumber Term BorrowerAPR BorrowerRate
## Min. : 4 Min. :12.00 Min. :0.00653 Min. :0.0000
## 1st Qu.: 400919 1st Qu.:36.00 1st Qu.:0.15629 1st Qu.:0.1340
## Median : 600554 Median :36.00 Median :0.20976 Median :0.1840
## Mean : 627886 Mean :40.83 Mean :0.21883 Mean :0.1928
## 3rd Qu.: 892634 3rd Qu.:36.00 3rd Qu.:0.28381 3rd Qu.:0.2500
## Max. :1255725 Max. :60.00 Max. :0.51229 Max. :0.4975
## NA's :25
## LenderYield EstimatedEffectiveYield EstimatedLoss
## Min. :-0.0100 Min. :-0.183 Min. :0.005
## 1st Qu.: 0.1242 1st Qu.: 0.116 1st Qu.:0.042
## Median : 0.1730 Median : 0.162 Median :0.072
## Mean : 0.1827 Mean : 0.169 Mean :0.080
## 3rd Qu.: 0.2400 3rd Qu.: 0.224 3rd Qu.:0.112
## Max. : 0.4925 Max. : 0.320 Max. :0.366
## NA's :29084 NA's :29084
## EstimatedReturn ProsperRating..numeric. ProsperScore
## Min. :-0.183 Min. :1.000 Min. : 1.00
## 1st Qu.: 0.074 1st Qu.:3.000 1st Qu.: 4.00
## Median : 0.092 Median :4.000 Median : 6.00
## Mean : 0.096 Mean :4.072 Mean : 5.95
## 3rd Qu.: 0.117 3rd Qu.:5.000 3rd Qu.: 8.00
## Max. : 0.284 Max. :7.000 Max. :11.00
## NA's :29084 NA's :29084 NA's :29084
## ListingCategory..numeric. EmploymentStatusDuration CreditScoreRangeLower
## Min. : 0.000 Min. : 0.00 Min. : 0.0
## 1st Qu.: 1.000 1st Qu.: 26.00 1st Qu.:660.0
## Median : 1.000 Median : 67.00 Median :680.0
## Mean : 2.774 Mean : 96.07 Mean :685.6
## 3rd Qu.: 3.000 3rd Qu.:137.00 3rd Qu.:720.0
## Max. :20.000 Max. :755.00 Max. :880.0
## NA's :7625 NA's :591
## CreditScoreRangeUpper CurrentCreditLines OpenCreditLines
## Min. : 19.0 Min. : 0.00 Min. : 0.00
## 1st Qu.:679.0 1st Qu.: 7.00 1st Qu.: 6.00
## Median :699.0 Median :10.00 Median : 9.00
## Mean :704.6 Mean :10.32 Mean : 9.26
## 3rd Qu.:739.0 3rd Qu.:13.00 3rd Qu.:12.00
## Max. :899.0 Max. :59.00 Max. :54.00
## NA's :591 NA's :7604 NA's :7604
## TotalCreditLinespast7years OpenRevolvingAccounts
## Min. : 2.00 Min. : 0.00
## 1st Qu.: 17.00 1st Qu.: 4.00
## Median : 25.00 Median : 6.00
## Mean : 26.75 Mean : 6.97
## 3rd Qu.: 35.00 3rd Qu.: 9.00
## Max. :136.00 Max. :51.00
## NA's :697
## OpenRevolvingMonthlyPayment InquiriesLast6Months TotalInquiries
## Min. : 0.0 Min. : 0.000 Min. : 0.000
## 1st Qu.: 114.0 1st Qu.: 0.000 1st Qu.: 2.000
## Median : 271.0 Median : 1.000 Median : 4.000
## Mean : 398.3 Mean : 1.435 Mean : 5.584
## 3rd Qu.: 525.0 3rd Qu.: 2.000 3rd Qu.: 7.000
## Max. :14985.0 Max. :105.000 Max. :379.000
## NA's :697 NA's :1159
## CurrentDelinquencies AmountDelinquent DelinquenciesLast7Years
## Min. : 0.0000 Min. : 0.0 Min. : 0.000
## 1st Qu.: 0.0000 1st Qu.: 0.0 1st Qu.: 0.000
## Median : 0.0000 Median : 0.0 Median : 0.000
## Mean : 0.5921 Mean : 984.5 Mean : 4.155
## 3rd Qu.: 0.0000 3rd Qu.: 0.0 3rd Qu.: 3.000
## Max. :83.0000 Max. :463881.0 Max. :99.000
## NA's :697 NA's :7622 NA's :990
## PublicRecordsLast10Years PublicRecordsLast12Months RevolvingCreditBalance
## Min. : 0.0000 Min. : 0.000 Min. : 0
## 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 3121
## Median : 0.0000 Median : 0.000 Median : 8549
## Mean : 0.3126 Mean : 0.015 Mean : 17599
## 3rd Qu.: 0.0000 3rd Qu.: 0.000 3rd Qu.: 19521
## Max. :38.0000 Max. :20.000 Max. :1435667
## NA's :697 NA's :7604 NA's :7604
## BankcardUtilization AvailableBankcardCredit TotalTrades
## Min. :0.000 Min. : 0 Min. : 0.00
## 1st Qu.:0.310 1st Qu.: 880 1st Qu.: 15.00
## Median :0.600 Median : 4100 Median : 22.00
## Mean :0.561 Mean : 11210 Mean : 23.23
## 3rd Qu.:0.840 3rd Qu.: 13180 3rd Qu.: 30.00
## Max. :5.950 Max. :646285 Max. :126.00
## NA's :7604 NA's :7544 NA's :7544
## TradesNeverDelinquent..percentage. TradesOpenedLast6Months
## Min. :0.000 Min. : 0.000
## 1st Qu.:0.820 1st Qu.: 0.000
## Median :0.940 Median : 0.000
## Mean :0.886 Mean : 0.802
## 3rd Qu.:1.000 3rd Qu.: 1.000
## Max. :1.000 Max. :20.000
## NA's :7544 NA's :7544
## DebtToIncomeRatio StatedMonthlyIncome TotalProsperLoans
## Min. : 0.000 Min. : 0 Min. :0.00
## 1st Qu.: 0.140 1st Qu.: 3200 1st Qu.:1.00
## Median : 0.220 Median : 4667 Median :1.00
## Mean : 0.276 Mean : 5608 Mean :1.42
## 3rd Qu.: 0.320 3rd Qu.: 6825 3rd Qu.:2.00
## Max. :10.010 Max. :1750003 Max. :8.00
## NA's :8554 NA's :91852
## TotalProsperPaymentsBilled OnTimeProsperPayments
## Min. : 0.00 Min. : 0.00
## 1st Qu.: 9.00 1st Qu.: 9.00
## Median : 16.00 Median : 15.00
## Mean : 22.93 Mean : 22.27
## 3rd Qu.: 33.00 3rd Qu.: 32.00
## Max. :141.00 Max. :141.00
## NA's :91852 NA's :91852
## ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
## Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 0.00 Median : 0.00
## Mean : 0.61 Mean : 0.05
## 3rd Qu.: 0.00 3rd Qu.: 0.00
## Max. :42.00 Max. :21.00
## NA's :91852 NA's :91852
## ProsperPrincipalBorrowed ProsperPrincipalOutstanding
## Min. : 0 Min. : 0
## 1st Qu.: 3500 1st Qu.: 0
## Median : 6000 Median : 1627
## Mean : 8472 Mean : 2930
## 3rd Qu.:11000 3rd Qu.: 4127
## Max. :72499 Max. :23451
## NA's :91852 NA's :91852
## ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
## Min. :-209.00 Min. : 0.0
## 1st Qu.: -35.00 1st Qu.: 0.0
## Median : -3.00 Median : 0.0
## Mean : -3.22 Mean : 152.8
## 3rd Qu.: 25.00 3rd Qu.: 0.0
## Max. : 286.00 Max. :2704.0
## NA's :95009
## LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination LoanNumber
## Min. : 0.00 Min. : 0.0 Min. : 1
## 1st Qu.: 9.00 1st Qu.: 6.0 1st Qu.: 37332
## Median :14.00 Median : 21.0 Median : 68599
## Mean :16.27 Mean : 31.9 Mean : 69444
## 3rd Qu.:22.00 3rd Qu.: 65.0 3rd Qu.:101901
## Max. :44.00 Max. :100.0 Max. :136486
## NA's :96985
## LoanOriginalAmount MonthlyLoanPayment LP_CustomerPayments
## Min. : 1000 Min. : 0.0 Min. : -2.35
## 1st Qu.: 4000 1st Qu.: 131.6 1st Qu.: 1005.76
## Median : 6500 Median : 217.7 Median : 2583.83
## Mean : 8337 Mean : 272.5 Mean : 4183.08
## 3rd Qu.:12000 3rd Qu.: 371.6 3rd Qu.: 5548.40
## Max. :35000 Max. :2251.5 Max. :40702.39
##
## LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees
## Min. : 0.0 Min. : -2.35 Min. :-664.87
## 1st Qu.: 500.9 1st Qu.: 274.87 1st Qu.: -73.18
## Median : 1587.5 Median : 700.84 Median : -34.44
## Mean : 3105.5 Mean : 1077.54 Mean : -54.73
## 3rd Qu.: 4000.0 3rd Qu.: 1458.54 3rd Qu.: -13.92
## Max. :35000.0 Max. :15617.03 Max. : 32.06
##
## LP_CollectionFees LP_GrossPrincipalLoss LP_NetPrincipalLoss
## Min. :-9274.75 Min. : -94.2 Min. : -954.5
## 1st Qu.: 0.00 1st Qu.: 0.0 1st Qu.: 0.0
## Median : 0.00 Median : 0.0 Median : 0.0
## Mean : -14.24 Mean : 700.4 Mean : 681.4
## 3rd Qu.: 0.00 3rd Qu.: 0.0 3rd Qu.: 0.0
## Max. : 0.00 Max. :25000.0 Max. :25000.0
##
## LP_NonPrincipalRecoverypayments PercentFunded Recommendations
## Min. : 0.00 Min. :0.7000 Min. : 0.00000
## 1st Qu.: 0.00 1st Qu.:1.0000 1st Qu.: 0.00000
## Median : 0.00 Median :1.0000 Median : 0.00000
## Mean : 25.14 Mean :0.9986 Mean : 0.04803
## 3rd Qu.: 0.00 3rd Qu.:1.0000 3rd Qu.: 0.00000
## Max. :21117.90 Max. :1.0125 Max. :39.00000
##
## InvestmentFromFriendsCount InvestmentFromFriendsAmount Investors
## Min. : 0.00000 Min. : 0.00 Min. : 1.00
## 1st Qu.: 0.00000 1st Qu.: 0.00 1st Qu.: 2.00
## Median : 0.00000 Median : 0.00 Median : 44.00
## Mean : 0.02346 Mean : 16.55 Mean : 80.48
## 3rd Qu.: 0.00000 3rd Qu.: 0.00 3rd Qu.: 115.00
## Max. :33.00000 Max. :25000.00 Max. :1189.00
##
## ListingKey ListingCreationDate
## 17A93590655669644DB4C06: 6 2013-10-02 17:20:16.550000000: 6
## 349D3587495831350F0F648: 4 2013-08-28 20:31:41.107000000: 4
## 47C1359638497431975670B: 4 2013-09-08 09:27:44.853000000: 4
## 8474358854651984137201C: 4 2013-12-06 05:43:13.830000000: 4
## DE8535960513435199406CE: 4 2013-12-06 11:44:58.283000000: 4
## 04C13599434217079754AEE: 3 2013-08-21 07:25:22.360000000: 3
## (Other) :113912 (Other) :113912
## CreditGrade LoanStatus ClosedDate
## :84984 Current :56576 :58848
## C : 5649 Completed :38074 2014-03-04 00:00:00: 105
## D : 5153 Chargedoff :11992 2014-02-19 00:00:00: 100
## B : 4389 Defaulted : 5018 2014-02-11 00:00:00: 92
## AA : 3509 Past Due (1-15 days) : 806 2012-10-30 00:00:00: 81
## HR : 3508 Past Due (31-60 days): 363 2013-02-26 00:00:00: 78
## (Other): 6745 (Other) : 1108 (Other) :54633
## ProsperRating..Alpha. BorrowerState Occupation
## :29084 CA :14717 Other :28617
## C :18345 TX : 6842 Professional :13628
## B :15581 NY : 6729 Computer Programmer : 4478
## A :14551 FL : 6720 Executive : 4311
## D :14274 IL : 5921 Teacher : 3759
## E : 9795 : 5515 Administrative Assistant: 3688
## (Other):12307 (Other):67493 (Other) :55456
## EmploymentStatus IsBorrowerHomeowner CurrentlyInGroup
## Employed :67322 False:56459 False:101218
## Full-time :26355 True :57478 True : 12719
## Self-employed: 6134
## Not available: 5347
## Other : 3806
## : 2255
## (Other) : 2718
## GroupKey DateCreditPulled
## :100596 2013-12-23 09:38:12: 6
## 783C3371218786870A73D20: 1140 2013-11-21 09:09:41: 4
## 3D4D3366260257624AB272D: 916 2013-12-06 05:43:16: 4
## 6A3B336601725506917317E: 698 2014-01-14 20:17:49: 4
## FEF83377364176536637E50: 611 2014-02-09 12:14:41: 4
## C9643379247860156A00EC0: 342 2013-09-27 22:04:54: 3
## (Other) : 9634 (Other) :113912
## FirstRecordedCreditLine IncomeRange IncomeVerifiable
## : 697 $25,000-49,999:32192 False: 8669
## 1993-12-01 00:00:00: 185 $50,000-74,999:31050 True :105268
## 1994-11-01 00:00:00: 178 $100,000+ :17337
## 1995-11-01 00:00:00: 168 $75,000-99,999:16916
## 1990-04-01 00:00:00: 161 Not displayed : 7741
## 1995-03-01 00:00:00: 159 $1-24,999 : 7274
## (Other) :112389 (Other) : 1427
## LoanKey LoanOriginationDate
## CB1B37030986463208432A1: 6 2014-01-22 00:00:00: 491
## 2DEE3698211017519D7333F: 4 2013-11-13 00:00:00: 490
## 9F4B37043517554537C364C: 4 2014-02-19 00:00:00: 439
## D895370150591392337ED6D: 4 2013-10-16 00:00:00: 434
## E6FB37073953690388BC56D: 4 2014-01-28 00:00:00: 339
## 0D8F37036734373301ED419: 3 2013-09-24 00:00:00: 316
## (Other) :113912 (Other) :111428
## LoanOriginationQuarter MemberKey
## Q4 2013:14450 63CA34120866140639431C9: 9
## Q1 2014:12172 16083364744933457E57FB9: 8
## Q3 2013: 9180 3A2F3380477699707C81385: 8
## Q2 2013: 7099 4D9C3403302047712AD0CDD: 8
## Q3 2012: 5632 739C338135235294782AE75: 8
## Q2 2012: 5061 7E1733653050264822FAA3D: 8
## (Other):60343 (Other) :113888
##
## FALSE TRUE
## 20 61
## ListingKey ListingNumber ListingCreationDate CreditGrade
## nbr.val NA 1.139370e+05 NA NA
## nbr.null NA 0.000000e+00 NA NA
## nbr.na NA 0.000000e+00 NA NA
## min NA 4.000000e+00 NA NA
## max NA 1.255725e+06 NA NA
## range NA 1.255721e+06 NA NA
## sum NA 7.153941e+10 NA NA
## median NA 6.005540e+05 NA NA
## mean NA 6.278857e+05 NA NA
## SE.mean NA 9.719466e+02 NA NA
## CI.mean NA 1.905000e+03 NA NA
## var NA 1.076340e+11 NA NA
## std.dev NA 3.280762e+05 NA NA
## coef.var NA 5.225095e-01 NA NA
## Term LoanStatus ClosedDate BorrowerAPR BorrowerRate
## nbr.val 1.139370e+05 NA NA 1.139120e+05 1.139370e+05
## nbr.null 0.000000e+00 NA NA 0.000000e+00 8.000000e+00
## nbr.na 0.000000e+00 NA NA 2.500000e+01 0.000000e+00
## min 1.200000e+01 NA NA 6.530000e-03 0.000000e+00
## max 6.000000e+01 NA NA 5.122900e-01 4.975000e-01
## range 4.800000e+01 NA NA 5.057600e-01 4.975000e-01
## sum 4.652076e+06 NA NA 2.492710e+04 2.196296e+04
## median 3.600000e+01 NA NA 2.097600e-01 1.840000e-01
## mean 4.083025e+01 NA NA 2.188277e-01 1.927641e-01
## SE.mean 3.091794e-02 NA NA 2.381098e-04 2.216543e-04
## CI.mean 6.059869e-02 NA NA 4.666916e-04 4.344391e-04
## var 1.089145e+02 NA NA 6.458385e-03 5.597798e-03
## std.dev 1.043621e+01 NA NA 8.036408e-02 7.481843e-02
## coef.var 2.556000e-01 NA NA 3.672483e-01 3.881348e-01
## LenderYield EstimatedEffectiveYield EstimatedLoss
## nbr.val 1.139370e+05 8.485300e+04 8.485300e+04
## nbr.null 1.000000e+01 1.000000e+00 0.000000e+00
## nbr.na 0.000000e+00 2.908400e+04 2.908400e+04
## min -1.000000e-02 -1.827000e-01 4.900000e-03
## max 4.925000e-01 3.199000e-01 3.660000e-01
## range 5.025000e-01 5.026000e-01 3.611000e-01
## sum 2.081640e+04 1.431143e+04 6.814193e+03
## median 1.730000e-01 1.615000e-01 7.240000e-02
## mean 1.827010e-01 1.686615e-01 8.030586e-02
## SE.mean 2.207578e-04 2.350442e-04 1.605364e-04
## CI.mean 4.326819e-04 4.606847e-04 3.146501e-04
## var 5.552605e-03 4.687770e-03 2.186827e-03
## std.dev 7.451580e-02 6.846729e-02 4.676352e-02
## coef.var 4.078566e-01 4.059451e-01 5.823177e-01
## EstimatedReturn ProsperRating..numeric. ProsperRating..Alpha.
## nbr.val 8.485300e+04 8.485300e+04 NA
## nbr.null 1.000000e+00 0.000000e+00 NA
## nbr.na 2.908400e+04 2.908400e+04 NA
## min -1.827000e-01 1.000000e+00 NA
## max 2.837000e-01 7.000000e+00 NA
## range 4.664000e-01 6.000000e+00 NA
## sum 8.151683e+03 3.455420e+05 NA
## median 9.170000e-02 4.000000e+00 NA
## mean 9.606830e-02 4.072243e+00 NA
## SE.mean 1.043721e-04 5.744090e-03 NA
## CI.mean 2.045685e-04 1.125837e-02 NA
## var 9.243489e-04 2.799688e+00 NA
## std.dev 3.040311e-02 1.673227e+00 NA
## coef.var 3.164739e-01 4.108859e-01 NA
## ProsperScore ListingCategory..numeric. BorrowerState Occupation
## nbr.val 8.485300e+04 1.139370e+05 NA NA
## nbr.null 0.000000e+00 1.696500e+04 NA NA
## nbr.na 2.908400e+04 0.000000e+00 NA NA
## min 1.000000e+00 0.000000e+00 NA NA
## max 1.100000e+01 2.000000e+01 NA NA
## range 1.000000e+01 2.000000e+01 NA NA
## sum 5.048810e+05 3.160850e+05 NA NA
## median 6.000000e+00 1.000000e+00 NA NA
## mean 5.950067e+00 2.774209e+00 NA NA
## SE.mean 8.158388e-03 1.184076e-02 NA NA
## CI.mean 1.599038e-02 2.320771e-02 NA NA
## var 5.647756e+00 1.597438e+01 NA NA
## std.dev 2.376501e+00 3.996797e+00 NA NA
## coef.var 3.994074e-01 1.440698e+00 NA NA
## EmploymentStatus EmploymentStatusDuration IsBorrowerHomeowner
## nbr.val NA 1.063120e+05 NA
## nbr.null NA 1.534000e+03 NA
## nbr.na NA 7.625000e+03 NA
## min NA 0.000000e+00 NA
## max NA 7.550000e+02 NA
## range NA 7.550000e+02 NA
## sum NA 1.021356e+07 NA
## median NA 6.700000e+01 NA
## mean NA 9.607158e+01 NA
## SE.mean NA 2.897687e-01 NA
## CI.mean NA 5.679427e-01 NA
## var NA 8.926585e+03 NA
## std.dev NA 9.448061e+01 NA
## coef.var NA 9.834397e-01 NA
## CurrentlyInGroup GroupKey DateCreditPulled CreditScoreRangeLower
## nbr.val NA NA NA 1.133460e+05
## nbr.null NA NA NA 1.330000e+02
## nbr.na NA NA NA 5.910000e+02
## min NA NA NA 0.000000e+00
## max NA NA NA 8.800000e+02
## range NA NA NA 8.800000e+02
## sum NA NA NA 7.770636e+07
## median NA NA NA 6.800000e+02
## mean NA NA NA 6.855677e+02
## SE.mean NA NA NA 1.973995e-01
## CI.mean NA NA NA 3.869000e-01
## var NA NA NA 4.416702e+03
## std.dev NA NA NA 6.645827e+01
## coef.var NA NA NA 9.693904e-02
## CreditScoreRangeUpper FirstRecordedCreditLine CurrentCreditLines
## nbr.val 1.133460e+05 NA 1.063330e+05
## nbr.null 0.000000e+00 NA 3.850000e+02
## nbr.na 5.910000e+02 NA 7.604000e+03
## min 1.900000e+01 NA 0.000000e+00
## max 8.990000e+02 NA 5.900000e+01
## range 8.800000e+02 NA 5.900000e+01
## sum 7.985993e+07 NA 1.097058e+06
## median 6.990000e+02 NA 1.000000e+01
## mean 7.045677e+02 NA 1.031719e+01
## SE.mean 1.973995e-01 NA 1.673743e-02
## CI.mean 3.869000e-01 NA 3.280514e-02
## var 4.416702e+03 NA 2.978830e+01
## std.dev 6.645827e+01 NA 5.457866e+00
## coef.var 9.432489e-02 NA 5.290069e-01
## OpenCreditLines TotalCreditLinespast7years OpenRevolvingAccounts
## nbr.val 1.063330e+05 1.132400e+05 1.139370e+05
## nbr.null 5.620000e+02 0.000000e+00 3.506000e+03
## nbr.na 7.604000e+03 6.970000e+02 0.000000e+00
## min 0.000000e+00 2.000000e+00 0.000000e+00
## max 5.400000e+01 1.360000e+02 5.100000e+01
## range 5.400000e+01 1.340000e+02 5.100000e+01
## sum 9.846610e+05 3.029684e+06 7.941170e+05
## median 9.000000e+00 2.500000e+01 6.000000e+00
## mean 9.260164e+00 2.675454e+01 6.969790e+00
## SE.mean 1.540275e-02 4.052720e-02 1.371954e-02
## CI.mean 3.018919e-02 7.943271e-02 2.689009e-02
## var 2.522696e+01 1.859915e+02 2.144588e+01
## std.dev 5.022644e+00 1.363787e+01 4.630970e+00
## coef.var 5.423926e-01 5.097404e-01 6.644346e-01
## OpenRevolvingMonthlyPayment InquiriesLast6Months TotalInquiries
## nbr.val 1.139370e+05 1.132400e+05 1.127780e+05
## nbr.null 5.227000e+03 5.000500e+04 8.430000e+03
## nbr.na 0.000000e+00 6.970000e+02 1.159000e+03
## min 0.000000e+00 0.000000e+00 0.000000e+00
## max 1.498500e+04 1.050000e+02 3.790000e+02
## range 1.498500e+04 1.050000e+02 3.790000e+02
## sum 4.538021e+07 1.625090e+05 6.297980e+05
## median 2.710000e+02 1.000000e+00 4.000000e+00
## mean 3.982922e+02 1.435085e+00 5.584405e+00
## SE.mean 1.324739e+00 7.243458e-03 1.914675e-02
## CI.mean 2.596468e+00 1.419707e-02 3.752735e-02
## var 1.999518e+05 5.941441e+00 4.134420e+01
## std.dev 4.471597e+02 2.437507e+00 6.429946e+00
## coef.var 1.122693e+00 1.698511e+00 1.151411e+00
## CurrentDelinquencies AmountDelinquent DelinquenciesLast7Years
## nbr.val 1.132400e+05 1.063150e+05 1.129470e+05
## nbr.null 8.974200e+04 8.981800e+04 7.643900e+04
## nbr.na 6.970000e+02 7.622000e+03 9.900000e+02
## min 0.000000e+00 0.000000e+00 0.000000e+00
## max 8.300000e+01 4.638810e+05 9.900000e+01
## range 8.300000e+01 4.638810e+05 9.900000e+01
## sum 6.704400e+04 1.046679e+08 4.692930e+05
## median 0.000000e+00 0.000000e+00 0.000000e+00
## mean 5.920523e-01 9.845071e+02 4.154984e+00
## SE.mean 5.880057e-03 2.195386e+01 3.023191e-02
## CI.mean 1.152482e-02 4.302926e+01 5.925409e-02
## var 3.915280e+00 5.124083e+07 1.032300e+02
## std.dev 1.978707e+00 7.158270e+03 1.016022e+01
## coef.var 3.342115e+00 7.270918e+00 2.445308e+00
## PublicRecordsLast10Years PublicRecordsLast12Months
## nbr.val 1.132400e+05 1.063330e+05
## nbr.null 8.580300e+04 1.049410e+05
## nbr.na 6.970000e+02 7.604000e+03
## min 0.000000e+00 0.000000e+00
## max 3.800000e+01 2.000000e+01
## range 3.800000e+01 2.000000e+01
## sum 3.540400e+04 1.605000e+03
## median 0.000000e+00 0.000000e+00
## mean 3.126457e-01 1.509409e-02
## SE.mean 2.162981e-03 4.725472e-04
## CI.mean 4.239410e-03 9.261861e-04
## var 5.297918e-01 2.374425e-02
## std.dev 7.278680e-01 1.540917e-01
## coef.var 2.328092e+00 1.020874e+01
## RevolvingCreditBalance BankcardUtilization
## nbr.val 1.063330e+05 1.063330e+05
## nbr.null 4.059000e+03 6.782000e+03
## nbr.na 7.604000e+03 7.604000e+03
## min 0.000000e+00 0.000000e+00
## max 1.435667e+06 5.950000e+00
## range 1.435667e+06 5.950000e+00
## sum 1.871323e+09 5.968563e+04
## median 8.549000e+03 6.000000e-01
## mean 1.759871e+04 5.613086e-01
## SE.mean 1.010048e+02 9.749470e-04
## CI.mean 1.979681e+02 1.910883e-03
## var 1.084807e+09 1.010718e-01
## std.dev 3.293640e+04 3.179179e-01
## coef.var 1.871524e+00 5.663871e-01
## AvailableBankcardCredit TotalTrades
## nbr.val 1.063930e+05 1.063930e+05
## nbr.null 4.881000e+03 4.000000e+00
## nbr.na 7.544000e+03 7.544000e+03
## min 0.000000e+00 0.000000e+00
## max 6.462850e+05 1.260000e+02
## range 6.462850e+05 1.260000e+02
## sum 1.192690e+09 2.471513e+06
## median 4.100000e+03 2.200000e+01
## mean 1.121023e+04 2.323003e+01
## SE.mean 6.075908e+01 3.639504e-02
## CI.mean 1.190870e+02 7.133377e-02
## var 3.927674e+08 1.409280e+02
## std.dev 1.981836e+04 1.187131e+01
## coef.var 1.767882e+00 5.110329e-01
## TradesNeverDelinquent..percentage. TradesOpenedLast6Months
## nbr.val 1.063930e+05 1.063930e+05
## nbr.null 5.100000e+01 5.424900e+04
## nbr.na 7.544000e+03 7.544000e+03
## min 0.000000e+00 0.000000e+00
## max 1.000000e+00 2.000000e+01
## range 1.000000e+00 2.000000e+01
## sum 9.425326e+04 8.536200e+04
## median 9.400000e-01 0.000000e+00
## mean 8.858972e-01 8.023272e-01
## SE.mean 4.542873e-04 3.365132e-03
## CI.mean 8.903968e-04 6.595612e-03
## var 2.195706e-02 1.204806e+00
## std.dev 1.481791e-01 1.097637e+00
## coef.var 1.672645e-01 1.368066e+00
## DebtToIncomeRatio IncomeRange IncomeVerifiable
## nbr.val 1.053830e+05 NA NA
## nbr.null 1.900000e+01 NA NA
## nbr.na 8.554000e+03 NA NA
## min 0.000000e+00 NA NA
## max 1.001000e+01 NA NA
## range 1.001000e+01 NA NA
## sum 2.908008e+04 NA NA
## median 2.200000e-01 NA NA
## mean 2.759466e-01 NA NA
## SE.mean 1.699668e-03 NA NA
## CI.mean 3.331326e-03 NA NA
## var 3.044379e-01 NA NA
## std.dev 5.517589e-01 NA NA
## coef.var 1.999513e+00 NA NA
## StatedMonthlyIncome LoanKey TotalProsperLoans
## nbr.val 1.139370e+05 NA 2.208500e+04
## nbr.null 1.394000e+03 NA 1.000000e+00
## nbr.na 0.000000e+00 NA 9.185200e+04
## min 0.000000e+00 NA 0.000000e+00
## max 1.750003e+06 NA 8.000000e+00
## range 1.750003e+06 NA 8.000000e+00
## sum 6.389616e+08 NA 3.138500e+04
## median 4.666667e+03 NA 1.000000e+00
## mean 5.608026e+03 NA 1.421100e+00
## SE.mean 2.215552e+01 NA 5.141249e-03
## CI.mean 4.342448e+01 NA 1.007722e-02
## var 5.592792e+07 NA 5.837605e-01
## std.dev 7.478497e+03 NA 7.640422e-01
## coef.var 1.333535e+00 NA 5.376413e-01
## TotalProsperPaymentsBilled OnTimeProsperPayments
## nbr.val 2.208500e+04 2.208500e+04
## nbr.null 6.500000e+01 7.500000e+01
## nbr.na 9.185200e+04 9.185200e+04
## min 0.000000e+00 0.000000e+00
## max 1.410000e+02 1.410000e+02
## range 1.410000e+02 1.410000e+02
## sum 5.065050e+05 4.918760e+05
## median 1.600000e+01 1.500000e+01
## mean 2.293434e+01 2.227195e+01
## SE.mean 1.295307e-01 1.267102e-01
## CI.mean 2.538894e-01 2.483609e-01
## var 3.705465e+02 3.545849e+02
## std.dev 1.924958e+01 1.883042e+01
## coef.var 8.393344e-01 8.454772e-01
## ProsperPaymentsLessThanOneMonthLate
## nbr.val 2.208500e+04
## nbr.null 1.828500e+04
## nbr.na 9.185200e+04
## min 0.000000e+00
## max 4.200000e+01
## range 4.200000e+01
## sum 1.355200e+04
## median 0.000000e+00
## mean 6.136292e-01
## SE.mean 1.646473e-02
## CI.mean 3.227205e-02
## var 5.986963e+00
## std.dev 2.446827e+00
## coef.var 3.987469e+00
## ProsperPaymentsOneMonthPlusLate ProsperPrincipalBorrowed
## nbr.val 2.208500e+04 2.208500e+04
## nbr.null 2.170000e+04 1.000000e+00
## nbr.na 9.185200e+04 9.185200e+04
## min 0.000000e+00 0.000000e+00
## max 2.100000e+01 7.249900e+04
## range 2.100000e+01 7.249900e+04
## sum 1.072000e+03 1.871110e+08
## median 0.000000e+00 6.000000e+03
## mean 4.853973e-02 8.472312e+03
## SE.mean 3.743250e-03 4.976446e+01
## CI.mean 7.337037e-03 9.754189e+01
## var 3.094532e-01 5.469353e+07
## std.dev 5.562852e-01 7.395508e+03
## coef.var 1.146041e+01 8.729031e-01
## ProsperPrincipalOutstanding ScorexChangeAtTimeOfListing
## nbr.val 2.208500e+04 1.892800e+04
## nbr.null 5.943000e+03 1.127000e+03
## nbr.na 9.185200e+04 9.500900e+04
## min 0.000000e+00 -2.090000e+02
## max 2.345095e+04 2.860000e+02
## range 2.345095e+04 4.950000e+02
## sum 6.471598e+07 -6.100900e+04
## median 1.626550e+03 -3.000000e+00
## mean 2.930314e+03 -3.223214e+00
## SE.mean 2.561489e+01 3.638894e-01
## CI.mean 5.020702e+01 7.132558e-01
## var 1.449047e+07 2.506361e+03
## std.dev 3.806635e+03 5.006357e+01
## coef.var 1.299054e+00 -1.553219e+01
## LoanCurrentDaysDelinquent LoanFirstDefaultedCycleNumber
## nbr.val 1.139370e+05 1.695200e+04
## nbr.null 9.486000e+04 7.000000e+01
## nbr.na 0.000000e+00 9.698500e+04
## min 0.000000e+00 0.000000e+00
## max 2.704000e+03 4.400000e+01
## range 2.704000e+03 4.400000e+01
## sum 1.741146e+07 2.757830e+05
## median 0.000000e+00 1.400000e+01
## mean 1.528165e+02 1.626846e+01
## SE.mean 1.381503e+00 6.916981e-02
## CI.mean 2.707725e+00 1.355800e-01
## var 2.174546e+05 8.110620e+01
## std.dev 4.663203e+02 9.005898e+00
## coef.var 3.051504e+00 5.535801e-01
## LoanMonthsSinceOrigination LoanNumber LoanOriginalAmount
## nbr.val 1.139370e+05 1.139370e+05 1.139370e+05
## nbr.null 1.822000e+03 0.000000e+00 0.000000e+00
## nbr.na 0.000000e+00 0.000000e+00 0.000000e+00
## min 0.000000e+00 1.000000e+00 1.000000e+03
## max 1.000000e+02 1.364860e+05 3.500000e+04
## range 1.000000e+02 1.364850e+05 3.400000e+04
## sum 3.634235e+06 7.912295e+09 9.498943e+08
## median 2.100000e+01 6.859900e+04 6.500000e+03
## mean 3.189688e+01 6.944447e+04 8.337014e+03
## SE.mean 8.880041e-02 1.153340e+02 1.850358e+01
## CI.mean 1.740475e-01 2.260529e+02 3.626673e+01
## var 8.984517e+02 1.515582e+09 3.901002e+07
## std.dev 2.997418e+01 3.893048e+04 6.245801e+03
## coef.var 9.397215e-01 5.605987e-01 7.491652e-01
## LoanOriginationDate LoanOriginationQuarter MemberKey
## nbr.val NA NA NA
## nbr.null NA NA NA
## nbr.na NA NA NA
## min NA NA NA
## max NA NA NA
## range NA NA NA
## sum NA NA NA
## median NA NA NA
## mean NA NA NA
## SE.mean NA NA NA
## CI.mean NA NA NA
## var NA NA NA
## std.dev NA NA NA
## coef.var NA NA NA
## MonthlyLoanPayment LP_CustomerPayments
## nbr.val 1.139370e+05 1.139370e+05
## nbr.null 9.350000e+02 6.208000e+03
## nbr.na 0.000000e+00 0.000000e+00
## min 0.000000e+00 -2.349900e+00
## max 2.251510e+03 4.070239e+04
## range 2.251510e+03 4.070474e+04
## sum 3.104507e+07 4.766075e+08
## median 2.177400e+02 2.583830e+03
## mean 2.724758e+02 4.183079e+03
## SE.mean 5.708794e-01 1.419337e+01
## CI.mean 1.118915e+00 2.781878e+01
## var 3.713245e+04 2.295279e+07
## std.dev 1.926978e+02 4.790907e+03
## coef.var 7.072108e-01 1.145306e+00
## LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees
## nbr.val 1.139370e+05 1.139370e+05 1.139370e+05
## nbr.null 6.308000e+03 6.223000e+03 7.164000e+03
## nbr.na 0.000000e+00 0.000000e+00 0.000000e+00
## min 0.000000e+00 -2.349900e+00 -6.648700e+02
## max 3.500000e+04 1.561703e+04 3.206000e+01
## range 3.500000e+04 1.561938e+04 6.969300e+02
## sum 3.538355e+08 1.227720e+08 -6.235275e+06
## median 1.587500e+03 7.008401e+02 -3.444000e+01
## mean 3.105537e+03 1.077543e+03 -5.472564e+01
## SE.mean 1.205623e+01 3.505939e+00 1.797548e-01
## CI.mean 2.363003e+01 6.871587e+00 3.523166e-01
## var 1.656106e+07 1.400469e+06 3.681507e+03
## std.dev 4.069528e+03 1.183414e+03 6.067542e+01
## coef.var 1.310410e+00 1.098252e+00 -1.108720e+00
## LP_CollectionFees LP_GrossPrincipalLoss LP_NetPrincipalLoss
## nbr.val 1.139370e+05 1.139370e+05 1.139370e+05
## nbr.null 1.057710e+05 9.703400e+04 9.722200e+04
## nbr.na 0.000000e+00 0.000000e+00 0.000000e+00
## min -9.274750e+03 -9.420000e+01 -9.545500e+02
## max 0.000000e+00 2.500000e+04 2.500000e+04
## range 9.274750e+03 2.509420e+04 2.595455e+04
## sum -1.622770e+06 7.980675e+07 7.763901e+07
## median 0.000000e+00 0.000000e+00 0.000000e+00
## mean -1.424270e+01 7.004463e+02 6.814205e+02
## SE.mean 3.236089e-01 7.076123e+00 6.983256e+00
## CI.mean 6.342686e-01 1.386909e+01 1.368708e+01
## var 1.193180e+04 5.704998e+06 5.556237e+06
## std.dev 1.092328e+02 2.388514e+03 2.357167e+03
## coef.var -7.669387e+00 3.409988e+00 3.459196e+00
## LP_NonPrincipalRecoverypayments PercentFunded Recommendations
## nbr.val 1.139370e+05 1.139370e+05 1.139370e+05
## nbr.null 1.106760e+05 0.000000e+00 1.096780e+05
## nbr.na 0.000000e+00 0.000000e+00 0.000000e+00
## min 0.000000e+00 7.000000e-01 0.000000e+00
## max 2.111790e+04 1.012500e+00 3.900000e+01
## range 2.111790e+04 3.125000e-01 3.900000e+01
## sum 2.864682e+06 1.137756e+05 5.472000e+03
## median 0.000000e+00 1.000000e+00 0.000000e+00
## mean 2.514269e+01 9.985835e-01 4.802654e-02
## SE.mean 8.166540e-01 5.308565e-05 9.846166e-04
## CI.mean 1.600629e+00 1.040471e-04 1.929834e-03
## var 7.598730e+04 3.210843e-04 1.104585e-01
## std.dev 2.756579e+02 1.791882e-02 3.323530e-01
## coef.var 1.096374e+01 1.794424e-02 6.920194e+00
## InvestmentFromFriendsCount InvestmentFromFriendsAmount
## nbr.val 1.139370e+05 1.139370e+05
## nbr.null 1.118060e+05 1.118060e+05
## nbr.na 0.000000e+00 0.000000e+00
## min 0.000000e+00 0.000000e+00
## max 3.300000e+01 2.500000e+04
## range 3.300000e+01 2.500000e+04
## sum 2.673000e+03 1.885743e+06
## median 0.000000e+00 0.000000e+00
## mean 2.346033e-02 1.655075e+01
## SE.mean 6.885352e-04 8.726094e-01
## CI.mean 1.349518e-03 1.710301e+00
## var 5.401533e-02 8.675701e+04
## std.dev 2.324120e-01 2.945454e+02
## coef.var 9.906593e+00 1.779650e+01
## Investors
## nbr.val 1.139370e+05
## nbr.null 0.000000e+00
## nbr.na 0.000000e+00
## min 1.000000e+00
## max 1.189000e+03
## range 1.188000e+03
## sum 9.169106e+06
## median 4.400000e+01
## mean 8.047523e+01
## SE.mean 3.058521e-01
## CI.mean 5.994655e-01
## var 1.065830e+04
## std.dev 1.032390e+02
## coef.var 1.282867e+00
There is a surprising drop in loans in 2009. In searching the news during
that time I found this Prosper News Story.
In October of 2009, the SEC forced Prosper.com, to stop brokering new loans
temporarily while it determined whether Prosper’s loans should be classified
as securities. After a six month quiet period Prosper was reopened to lenders
and borrowers. Prosper made other changes for its business including only
allowing borrowers with a credit score above 640 to request a loan. I added
cohort for before 2010 and after 2010 to see other interesting similarities
and differences.
## Q4 2005 Q1 2006 Q2 2006 Q3 2006 Q4 2006 Q1 2007 Q2 2007 Q3 2007 Q4 2007
## 22 315 1254 1934 2403 3079 3118 2671 2592
## Q1 2008 Q2 2008 Q3 2008 Q4 2008 Q2 2009 Q3 2009 Q4 2009 Q1 2010 Q2 2010
## 3074 4344 3602 532 13 585 1449 1243 1539
## Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011 Q1 2012 Q2 2012 Q3 2012
## 1270 1600 1744 2478 3093 3913 4435 5061 5632
## Q4 2012 Q1 2013 Q2 2013 Q3 2013 Q4 2013 Q1 2014
## 4425 3616 7099 9180 14450 12172
I’ve added color differences in the graphs for before and after the re-open
in 2010. There is a drop at the beginning of 2013 followed by fairly
consistent growth in number of loans peaking at 14450 in Q4 of 2013.
January is the biggest month for new loans, followed by October and December.
The number of loans listed rises over the month and peaks towards the end of
the month. The highest number appear the last day of the month on the 30th.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 2500 4200 6050 7500 25000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 4000 7784 9191 14000 35000
There is a difference in loan amount before 2010 and after 2010. The minimum
loan amount pre-2010 and post-2010 remained the same at $1000. However, the
maximum loan amount increased from $25000 to $35000.
The mean increased from $6050 to $9190. Post-2010 there are spikes in number
of loans at $4000, $10000, and $15000.
For both pre-2010 and post-2010 data the distribution is skewed to the right
and a small number of loans greater than 25000$.
## NotAvailable DebtConsolidation HomeImprovement Business
## 16945 5993 813 2088
## PersonalLoan StudentUse Auto Other
## 2395 595 450 1708
## BabyAndAdoption Boat CosmeticProcedure EngagmentRing
## 0 0 0 0
## GreenLoans HouseholdExpenses LargePurchases MedicalDental
## 0 0 0 0
## Motorcycle RV Taxes Vacation
## 0 0 0 0
## WeddingLoan
## 0
## NotAvailable DebtConsolidation HomeImprovement Business
## 20 52315 6620 5101
## PersonalLoan StudentUse Auto Other
## 0 161 2122 8786
## BabyAndAdoption Boat CosmeticProcedure EngagmentRing
## 199 85 91 217
## GreenLoans HouseholdExpenses LargePurchases MedicalDental
## 59 1996 876 1522
## Motorcycle RV Taxes Vacation
## 304 52 885 768
## WeddingLoan
## 771
Pre-2010 the listing category wasn’t very informative. Almost 17000 of the
loans had a listing category of “Not Available”. Post-2010 since there are
only 20 loans where this information is not available.
The debt consolidation category is the largest with a little over 50% of the
loans listed.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 82.44 149.03 212.06 271.25 1130.90
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 160.1 256.4 295.0 390.4 2251.5
The distribution for MonthlyLoanPayment is right skewed - Both pre-2010 and
post-2010 mean values are greater than the median and there is a long tail to
the right. The median payments are $100 higher (256.40) post-2010 than
pre-2010 (149.03).
Loans are for 1 year, 3 year or 5 years Terms. Almost 80% of loans are for a
3 year term.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 600.0 640.0 648.2 700.0 880.0 591
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 600.0 660.0 700.0 699.3 720.0 880.0
Here are the Credit Score Ranges as defined by experian, transunion, and equifax:
Pre-2010 loans were allowed to be listed with Proper for Borrowers with a
“Bad” credit rating. In addition there were 591 pre-2010 loans listed where
the lower credit score was “Not Available”.
The median rose from 640 pre-2010 to 700 post-2010. For pre-2010 lower credit
score data is skewed slightly to the right since the mean (648.2) is greater
than the median (640). The post-2010 lower credit score data is normally
distributed - The median and mean are very close at 700 and 699 respectively.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 19.0 619.0 659.0 667.2 719.0 899.0 591
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 619.0 679.0 719.0 718.3 739.0 899.0
Pre-2010 borrowers had a upper credit score median of 659 while post-2010
borrowers had a median upper credit score of 719. The upper credit score
values look normally distributed.
## N/A NC HR E D C B A AA NA's
## 0 141 3508 3289 5153 5649 4389 3315 3509 2034
The Credit Grade is the rating that was assigned to pre-2010 loans at the
time the listing went live. I could not determine via the Prosper website
search how the Credit Grade was determined. Credit Grade, from lowest-risk
to highest-risk, are labeled AA, A, B, C, D, E, HR (“High Risk”), and NC
(“No Credit”).
There were 141 loans listed for No Credit “NC” borrowers with 3508 loans
listed for “HR” - High Risk borrowers. It is surprising how many loans were
listed for borrowers in the Credit Grade groups lower than C.
## N/A HR E D C B A AA NA's
## 0 6728 9602 13941 17955 15483 14093 5075 73
Prosper Ratings, from lowest-risk to highest-risk, are labeled AA, A, B, C, D,
E, and HR (“High Risk”). Post-2010 Prosper provides a proprietary “Prosper
Rating” based on the company’s estimation of that borrower’s “estimated loss
rate.”
According to the company, the Prosper Rating is determined by two scores:
Even though Credit Rating and Prosper Rating look similar, I decided not to
combine them for my analysis since there was a decision by the company to no
longer use Credit Grade and switch to Prosper Rating.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.000 4.000 6.000 5.906 8.000 11.000 73
##
## 1 2 3 4 5 6 7 8 9 10 11
## 971 5750 7613 12541 9695 12103 10367 11629 6245 4507 1456
The Prosper Score is a custom risk score for post-2010 loans built using
historical Prosper data. The score ranges from 1-11, with 11 being the best,
or lowest risk score. The Prosper score estimates the probability of a loan
going “bad,” where “bad” is the probability of going 60+ days past due within
the first twelve months from the date of loan origination. Prosper Scorecard Link
The Prosper Score looks normally distributed.
The distribution of Income Ranges is left skewed with over 50% of the loans
falling in the $25,000 - $75,000 income ranges. Prior to 2010 almost 6% of
loans didn’t include borrower income range information. This does not appear
to be the case post-2010 where loans do contain borrower income range
information.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1250 0.1700 0.1840 0.2375 0.4975
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0400 0.1364 0.1875 0.1960 0.2573 0.3600
The Borrower Rate is the Borrower’s interest rate for this loan. The median
Borrower Rate for pre-2010 loans was 17% which is lower than the median
18.75% post-2010. The max Borrower Rate for pre-2010 loans was 50% vs 36%
post-2010. The interquartile range pre-2010 is 11.25% where post-2010 the
interquartile range is 12.09%.
It surprised me that Borrowers were paying a higher rate for a Prosper loan
post-2010 compared to pre-2010 since interests rates for things like mortgages
were lower. Link to Historical Mortgage Rates.
I defined a factor variable for loans in “GoodStanding”" vs “InDefault” to
group loans for further analysis. I grouped loans as “InDefault” if the loan
status might affect a borrowers credit rating.
Loans in the “InDefault” group:
Pre-2010 the percentage of loans “In Default” was about 35%. Post-2010 about
10% of loans listed were “InDefault”
The prosper loans dataset contains 113937 rows and 81 variables. There are 70
numeric variables and 26 factor variables. The dataset met the criteria for
tidiness as defined
by Wikipedia here.
The main areas I am focused on are how the change in Prosper Loans business
affected their two customer groups - borrowers and lenders.
Lender - What features are of interest and may impact if the loan will get
paid back?
LoanOriginalAmount
Yes I created some new variables namely:
Each of these variables helped with creating graphs with a different or
labeled groups.
I performed some operations to adjust the data mainly for correctly ordering
factor variables.
This chart shows the relationship between Credit Grade and Upper Credit Score.
There are a surprising number of loans for borrowers either “No Credit”" or
are considered “High Risk”. It also seems odd that all the borrowers in the
“High Risk” are not all in the same category namely “Bad”. That would seem
confusing to a lender to have this group split when the E and D Credit Grades
are much more uniform.
This chart shows the relationship between Prosper Rating and Upper
Credit Score. Post-2010 there are no borrowers with a “Bad” credit score. It
also seems odd that there are borrowers with a “Poor” through “Exception”
Credit Rating in the High Risk category. That would seem very confusing to a
lender to have this mix of credit ratings across the prosper rating groups. It
is very odd to have a “High Risk”" loan for a borrower with an “Exceptional”
credit rating.
Even with the mix of Credit Ratings within a Prosper Rating, there is a clear
increase in better credit scores as the prosper rating values increase.
These two graphs show the relationship between Borrower Rating and the
pre-2010 Credit Grade and post-2010 Prosper Rating. The Borrower Rate for
Credit Grades “NC” through “E” were between 15% - 28%. For Credit Grades D,
C, B, A, and AA there is downward stair step.
The Borrower Rate for Prosper Ratings post-2010 “HR” have a median value above
30% which is higher. There is a clear downward stair step towards “AA” with
the Borrower Rate median and interquartile range below 10%.
These three graphs compare the loan status to each of the three variables
measuring credit quality and risk. The Credit Grade graph shows that the 35%
of defaults in loans pre-2010 occured across all the Credit Grades. I was
surprised to see about 2% of loans in AA in default and made me question the
validity of the rating. As I mentioned earlier in the report, I could not find
details on how the Credit Grade was determined.
The Prosper Rating graph shows a decrease in loans “InDefault”" as the Prosper
Rating increases. This is more in line with what I would expect for a borrower
loan rating.
The Prosper Score is an internal scorecard and estimates the probability of a
loan going “bad,” but only looks at the possibility of a loan going bad within
the first year of the loan. In the univariate analysis, we saw that almost
80% of the loans are for three years. As we can see in the chart above, loans
with a higher prosper score six or above also had about a 1% default rate. If
prosper updated thier model to look at longer timelines, this score could be
more accurate and reduce the post-2010 10% default rate.
I used a scatter plot with an alpha of 1/4 to plot Loan Original Amounts vs
the Credit Grade and Prosper Rating. The graph of Credit Grade shows a larger
number of defaults for loan amounts greater than $5000 for Credit Grades B and
lower. The higher loan amounts even across credit grades are more likely to
be “InDefault” as is shown by the predominantly blue color.
The Prosper Rating graph shows the much lower rates of default as we saw
earlier. It also shows lower Loan Amounts for borrowers in the High Risk
group post-2010 compared to pre-2010. Post-2010 shows much more restraint and
consistency when loaning money to borrowers with lower ratings.
For both plots, I only looked at Loans for $10000 to be able to compare Monthly
Loan Payments pre-2010 and post-2010. I took the sqrt of the Monthly Loan
Payment to adjust for the skewness in the data. The plot for Monthly Loan
Payments post-2010 shows that interquartile range for payments across all
Prosper Ratings are below $500. For pre-2010 loans there it was interesting
to see that interquartile range that are in default were all above \(500. We \
can see from these charts that a monthly loan payment below 500\) has an impact
on keeping the loan in good standing.
I summarized my findings after each plot or group of plot above.
The Monthly Loan Payment analysis was interesting. Having a payment amount
that the borrower can make each month has a big influence on keeping the loan
in good standing.
The strongest relationship I found was BorrowerRate and ProsperRating. It
was interesting because if a borrower wanted to reduce the rate they paid it
wasn’t as transparent to figure out what actions they could take. The Prosper
Rating is composed of the Credit Score and the Prosper Score. If they want to
increase thier credit score they can call the credit service and see details.
For the Prosper Rating or Prosper Score it isn’t as clear how those are
calculated and they affect the rate the borrower pays.
##
## Calls:
## Model 1 : lm(formula = BorrowerRate ~ Term, data = subset_vars)
## Model 2 : lm(formula = BorrowerRate ~ Term + CreditScoreRangeUpper, data = subset_vars)
## Model 3 : lm(formula = BorrowerRate ~ Term + CreditScoreRangeUpper + DebtToIncomeRatio,
## data = subset_vars)
## Model 4 : lm(formula = BorrowerRate ~ Term + CreditScoreRangeUpper + DebtToIncomeRatio +
## ProsperScore, data = subset_vars)
## Model 5 : lm(formula = BorrowerRate ~ Term + CreditScoreRangeUpper + DebtToIncomeRatio +
## ProsperScore, data = subset_vars)
## Model 6 : lm(formula = BorrowerRate ~ Term + CreditScoreRangeUpper + DebtToIncomeRatio +
## ProsperScore + ProsperRating..numeric., data = subset_vars)
##
## ================================================================================================================
## Model 1 Model 2 Model 3 Model 4 Model 5 Model 6
## ----------------------------------------------------------------------------------------------------------------
## (Intercept) 0.196*** 0.764*** 0.766*** 0.635*** 0.635*** 0.328***
## (0.001) (0.003) (0.004) (0.003) (0.003) (0.001)
## Term -0.000 0.000*** 0.000*** 0.000*** 0.000*** 0.000***
## (0.000) (0.000) (0.000) (0.000) (0.000) (0.000)
## CreditScoreRangeUpper -0.001*** -0.001*** -0.000*** -0.000*** 0.000***
## (0.000) (0.000) (0.000) (0.000) (0.000)
## DebtToIncomeRatio 0.028*** 0.010*** 0.010*** -0.001**
## (0.001) (0.001) (0.001) (0.000)
## ProsperScore -0.017*** -0.017*** 0.001***
## (0.000) (0.000) (0.000)
## ProsperRating..numeric. -0.045***
## (0.000)
## ----------------------------------------------------------------------------------------------------------------
## sigma 0.074 0.064 0.062 0.051 0.051 0.021
## R-squared 0.000 0.253 0.289 0.526 0.526 0.920
## F 0.015 14032.034 10290.557 20992.080 20992.080 174124.310
## p 0.901 0.000 0.000 0.000 0.000 0.000
## N 82877 82877 75772 75772 75772 75772
## ================================================================================================================
I focused my analysis in this section on looking at how to increase the
ability for borrowers to keep thier loan in good standing.
The first two plots looked at MontlyLoanPayment vs Term by LoanStatus for
borrowers with a ProsperScore below 6. I thought about looking at this as
if I were a data analyst for Prosper - There might be an opportunity for
Prosper to increase the Term or provide more flexibility on the Term (4 year
loans?) to reduce the monthly payment amount to < $250 and increase the
ability for the borrower to make the lower payments.
The second two plots looked at MonthlyLoanPayment vs Loan Original Amount by
Loan Status for borrowers with a ProsperCore below 6. Again I thought about
looking at this as if I were a data analyst for Prosper - There might be an
opportunity to refine the Loan Consolidation model to keep the payment
amounts lower.
I created a model for BorrowerRate to better understand how it was calculated.
I looked at Term, CreditScoreRangeUpper, DebtToIncomeRatio, ProsperScore, and
ProsperRating..numeric. The correlation between BorrowerRate and
ProsperRating..numeric is strong at .92 R-squared. It was suprising the
correlation for Credit Score was so low at .29. It would be a challenge as a
borrower to figure out how to get a better rate on the Prosper platform
without increased transparency into how those metrics are calculated.
The models I put together are simplistic at this point compared to the number
of variables associated with peer to peer lending but still insightful. After
I get more experience with model development beyond linear regression I would
like to revisit this analysis.
I chose this plot because it showed a difference between the Credit Score and
the Prosper Rating. It was suprising that a “Poor” credit score would end
appear in the higher Prosper Ratings.
For this plot I looked at loans before 2010. I took the sqrt of the Monthly
Loan Payment to adjust for the skewness in the data. It was interesting to
see the medians montly payment for loans in good standing and loans in default
drift as the monthly payment rose. From this chart your can see that a lower
monthly payment has an impact on keeping the loan in good standing.
For this plot I looked at loans originated after 2010. The graph shows how
currently the monthly payment is correlated to the Loan Amount rather than how
if a borrower can consistently make monthly payments. (The correlation
between Montly Loan Payment and Loan Original Amount when calculated with
cor.test was .91.) As loan amounts increase, monthly payments increase.
Prosper already made an adjustment in 2010 to keep payments lower. It would
be intesting for them to model if adding more options for Term and Monthly
payment amounts would futher reduce the post-2010 10% default rate.
========================================================
For this exploratory data analysis I decided to take on one of the larger
data sets. With 81 variables to explore, it was easy to get off track and
look into each of the variables. It took a bit to figure out which variables
were important and I was afraid to miss an insight. I did alot of plots as a
result and many of them are not included because they not that useful. I
decided to keep my focus on questions I thought Prosper’s customers would
care about.
I was very surprised at the Peer to Peer business environment pre-2010 that
led to action from the SEC. I found it incredibly helpful to have domain
knowledge for Peer to Peer Lending and overview of the business - both
provided much needed context to what I was seeing in the data.
The interesting models I would like to predict are more complex and outside
the scope of EDA. For example the Prosper Score is an internal scorecard and
estimates the probability of a loan going “bad,” where “bad” is the
probability of going 60+ days past due within the first twelve months from
the date of loan origination. This isn’t helpful for loans that are 3 year or
5 year in term. The lenders would probably like a better score to predict if
it is likely that a loan will get paid off. It would also be good to know if
focusing on lower monthly payments would help keep the loans in good standing.
The loans are not secured, Prosper has discontinued use of a secondary market,
and the lender is bearing the risk.
What did go well is my EDA skills have improved. I also have a much better
understanding of when to apply a transform to a variable and what plots to use
and in what circumstance. The example EDA projects were inspiring and I was
able to use a few tips. I am excited to take on machine learning. I would
like to develop a model that attempt to forecast what type of loans are likely
to stay in good standing as a future project.